language description
Adversarial Ranking for Language Generation
Generative adversarial networks (GANs) have great successes on synthesizing data. However, the existing GANs restrict the discriminator to be a binary classifier, and thus limit their learning capacity for tasks that need to synthesize output with rich structures such as natural language descriptions. In this paper, we propose a novel generative adversarial network, RankGAN, for generating high-quality language descriptions. Rather than training the discriminator to learn and assign absolute binary predicate for individual data sample, the proposed RankGAN is able to analyze and rank a collection of human-written and machine-written sentences by giving a reference group. By viewing a set of data samples collectively and evaluating their quality through relative ranking scores, the discriminator is able to make better assessment which in turn helps to learn a better generator. The proposed RankGAN is optimized through the policy gradient technique. Experimental results on multiple public datasets clearly demonstrate the effectiveness of the proposed approach.
MemVLT: Vision-LanguageTrackingwithAdaptive Memory-basedPrompts
As an extension of traditional visual single object tracking (SOT) task [2, 3, 4], VLT can harness the complementary advantages of multiple modalities. Therefore, vision-language trackers (VLTs) have the potential to achieve more promising tracking performance, which has recently attracted widespreadattention[5,6,7,8].
- Europe > United Kingdom > England > Staffordshire (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.96)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
- North America > United States > California (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > Montserrat (0.04)
- Europe > Germany > Bavaria > Lower Franconia > Würzburg (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
- North America > United States (0.14)
- Asia > China > Shanghai > Shanghai (0.04)
- Europe > Germany > Berlin (0.04)
- Information Technology (0.93)
- Transportation > Ground > Road (0.93)
- Transportation (0.93)
- Government > Regional Government (0.46)
HUMANISE: Language-conditioned HumanMotionGenerationin3DScenes
We automatically annotate the aligned motions with language descriptions that depict the action and the unique interacting objects in the scene;e.g., sit on the armchair near the desk. HUMANISE thus enables a new generation task,language-conditioned human motion generation in 3D scenes.The proposed task is challenging as itrequires joint modeling of the 3D scene, human motion, and natural language.